Run Args

meta arguments

Now you have a concept of the Function, Dataset and Runners, it’s time to talk about run_args. These are “extra” arguments that go alongside your function, but do not interact directly with it.

This can create some confusing terminology, so lets be explicit. Whenever args are discussed, this is referring to the actual Function arguments, i.e. what the Runner is storing. run_args deal with things like the remote directory and resource requests.

Native Arguments

Here we will cover the run_args that are natively understood by a run.

While you can implement your own functionality (more on that later), the following arguments are common to all runs.

Skip & Force

This is a special “contextual” run arg. There are three situations where there args are relevant.

Dataset init

By default, when defining Dataset(...), a search is done to see if a matching Dataset has already been created. If this is the case, the current creation will be “skipped”, and the Dataset will instead be unpacked from the previous state.

Setting skip=False will ensure a new Dataset is created, deleting the old database in the process.

force=True is ignored here, only skip has any function.

Note

It is advised to use Dataset(..., skip=False) while testing, as it ensures consistent behaviour. Only once you care about the result should you drop this argument (or change it to True).

Run append

Any Runner that already exists cannot be added to a Dataset.

With skip=False runner will be appended anyway. This does not overwrite the existing runner, and allows for multiple copies of the same run.

force=True acts as an inverted alias of skip. i.e. skip=False == force=True

Run()

When running a Dataset, is_finished is called to get the states of any runners. Any that are already running or have completed will not be submitted.

skip=False allows runners which are already submitted to be resubmitted

In general force=True functions as an inverted alias of skip. However there is an additional keyword argument force_ignores_success which is required to resubmit runners considered as “succeeded”. This is an extra safeguard against overwriting data.

Important

force_ignores_success is required for skip=False/force=True to function on runners which are considered to have succeeded. This is a runner which has successfully returned a result file.

Dirs

The most commonly set run_args are the *_dir family. These designate where your run files will end up and it is recommended to change these from defaults when doing a full run. remotemanger can create a lot of small files, which can make directory navigation cumbersome, even with proper segmentation.

local_dir

This directory is on your machine, and dictates where the runners will “stage” from. When running, files are first written to this directory then sent to the remote.

remote_dir

This directory is the main one on the remote machine, and is where all the main run files are copied to.

run_dir

This directory is not always used, it exists within the remote_dir, and is where the run will actually be executed.

Warning

Be careful using run_dir with runs where the file system needs to be interacted with. A good example is when sending extra files, you will need to access them using ../file, for example.

Run modifiers

Asynchronous

True by default, ensures that runs are executed in parallel. Set to False to force a dataset to execute its runners one after another (only functions when submitter="bash")

Argument Hierarchy

run_args can be set at multiple levels.

  • Dataset - This is the “top level” storage, all runners inherit from this dictionary

  • Runner - Runners can have their own “local” run_args, just for that run

  • Run/Temporary - when running a Dataset, you can also pass arguments into the run. These are considered “temporary” arguments, and will be dropped after the run completes.

Lets demonstrate what this looks like, starting with the defaults:

[1]:
from remotemanager import Dataset

def function(inp):
    return inp

# skip=False will be used heavily throughout the tutorials
# it is recommended that you also do so when experimenting
ds = Dataset(function, skip=False)

ds.append_run({"inp": 1})
appended run runner-0
[2]:
print(ds.run_args)
{'skip': True, 'force': False, 'asynchronous': True, 'local_dir': 'temp_runner_local', 'remote_dir': 'temp_runner_remote'}

The defaults here mean:

  • Runs will try to skip (if they already have results)

  • Runs will not be forced

  • Jobs will be run asynchronously

  • The local staging directory is temp_runner_local

  • The remote running directory is temp_runner_remote

By comparison, the Runner object will appear to have no run_args, since these are “overrides” that are set at the runner level.

[3]:
print(ds.runners[0].run_args)
{}

When running a job, these arguments are combined into a single dictionary.

This can be seen at derived_run_args:

[4]:
print(ds.runners[0].derived_run_args)
{'skip': True, 'force': False, 'asynchronous': True, 'local_dir': 'temp_runner_local', 'remote_dir': 'temp_runner_remote'}

Setting run_args

Now we know the default values, how do we change them?

Firstly, any argument passed to Dataset, append_run, or run() that is not part of those functions will be treated as a run_arg. However you can update them after initialisation.

There are multiple ways to update or set args. The most obvious way is to directly update the run_arg dictionaries, but there also functions that can do this more “explicity”.

Note

These functions exist on both Dataset and Runner.

Lets start by demonstrating a direct method:

[5]:
ds.run_args["direct"] = True

for k, v in ds.run_args.items():
    print(k, v)
skip True
force False
asynchronous True
local_dir temp_runner_local
remote_dir temp_runner_remote
direct True

set_run_args

This function can take a list of keys and values, and set them. You can also pass a single (key, val) pair.

[6]:
ds.set_run_args(["a", "b", "c"], [1, 2, 3])

ds.set_run_args("d", 4)

for k, v in ds.run_args.items():
    print(k, v)
skip True
force False
asynchronous True
local_dir temp_runner_local
remote_dir temp_runner_remote
direct True
a 1
b 2
c 3
d 4

update_run_args

This function takes a dictionary of arguments and updates the inner run_args with it. Useful for setting a large set of arguments at once.

[7]:
ds.update_run_args({"a": 10, "b": 11, "c": 12, "d": 13})

for k, v in ds.run_args.items():
    print(k, v)
skip True
force False
asynchronous True
local_dir temp_runner_local
remote_dir temp_runner_remote
direct True
a 10
b 11
c 12
d 13

Custom run_args

Unhandled run_args will be ignored by a run. However if you are using a Computer that accepts arguments for its script() method, they can be used there.

The main use for this dynamic ability is for scheduler resources, and this is covered in depth within the Scheduler Tutorial.

Runner overrides

The run_args of Runner act as “local” overrides for whatever is set in Dataset.

We can demonstrate this by setting a value on the runner.

[8]:
ds.runners[0].run_args["d"] = "foo"

print("Dataset args:", ds.run_args.get("d", None))

print("Runner args:", ds.runners[0].run_args.get("d", None))

print("Derived args:", ds.runners[0].derived_run_args.get("d", None))
Dataset args: 13
Runner args: foo
Derived args: foo

At the Dataset level, the value of d is still 13. However, on the runner on which we override the value, it is now “foo”.

Any other runners, will retain the Dataset level value.

[9]:
ds.append_run({"inp": 2})

print("Derived args:", ds.runners[1].derived_run_args.get("d", None))
appended run runner-1
Derived args: 13

Temporary Run() Args

As was mentioned previously, you can also pass the same args to the Run() call of a Dataset. While difficult to demonstrate here, you can verify it by setting a remote_dir to something and then updating that arg within Run().

Args set this way are discarded after the run, and are considered “temporary” by the Dataset, whereas args set any other way are saved when the Dataset is.